Eighteen mitochondrial genomes of Syrphidae (Insecta: Diptera: Brachycera) with a phylogenetic analysis of Muscomorpha

In this study, 18 mitochondrial genomes (mitogenomes) of Syrphidae were sequenced. These mitogenomes ranged from 15,648 to 16,405 bp and contained 37 genes that were similar to those from other Syrphidae species. Most protein-coding genes (PCGs) started with a standard ATN codon and ended with TAA/G. All transfer RNAs (tRNAs) could be folded into the cloverleaf secondary structure except tRNA-Ser (AGN), which lacks a dihydrouridine arm. The secondary structures of ribosomal RNAs (rRNAs) were predicted. Six domains (III is absent in arthropods) and 44 helices were included in the 16S rRNA, and three domains and 24 helices were included in the 12S rRNA. We found three conserved fragments in all syrphid mitogenomes. Phylogenetic analyses were performed based on the nucleotide data of 13 PCGs and two rRNAs from 76 Muscomorpha and three outgroup species. In results the paraphyly of Aschiza and Schizophora were supported, the Acalyptratae was also paraphyletic but the relationships of its superfamilies were difficult to determine, the monophyly of Calyptratea was supported with the relationships of Oestroidea and Muscoidea need to be further reconsidered. Within Syrphidae the monophyly of family level was supported, the Syrphinae were clustered into one branch, while the paraphyly of Eristalinae was still well supported.


Introduction
The Muscomorpha (Diptera: Brachycera), an infraorder of Brachycera, consists of two groups, Aschiza and Schizophora, plus the unranked clade Cyclorrhapha [1][2][3][4]. The monophyly of several of these groups is debated. The evolution of Aschiza has been considered one of the most challenging phylogenetic questions in dipterology [1]. Aschiza contains two superfamiles: Syrphoidea and Platypezoidea. Syrphoidea was suggested to be paraphyletic in some previous studies [1,2,5], while Li [6], Pu et al., [7] and Skevington and Yeates [8] argued Syrphoidea is monophyletic. Schizophora underwent a recent, rapid radiation of lineages, and so represents most of the family-level diversity in Diptera. Relationships

Sequencing assembly and annotation
The 18 complete mitogenomic sequences were sequenced using an Illumina NovaSeq 6000 platform and assembled with Geneious R9 [64]. Thirteen PCGs were identified using the invertebrate mitochondrial genetic code setting in ORF Finder, and PCG base composition and codon usage patterns were analyzed using MEGA 7 [65]. The locations and secondary structures of 22 tRNA genes were identified and predicted using tRNA scan-SE 1.21 [66] and ARWEN 1.2 [67]. Two rRNA genes were determined based on the locations of adjacent tRNAs genes and compared with the genes of other Syrphid mitogenomes, and the secondary structures of 16S rRNA and 12S rRNA were predicted according to data from other species [68][69][70][71][72]. Helical elements were predicted using ClustalX 1.81 and RNA Structure 5.2 [73]. The AT skew, GC skew, and A+T content were used to characterize the nucleotide composition of the mitogenomes [74,75]. Strand asymmetry was calculated using the following formulas: AT skew = (A-T) / (A+T), and GC skew = (G-C) / (G+C) [76,77].

Phylogenetic analysis
To make a Muscomorpha mitogenome phylogenetic tree, we combined our 18 Syrphidae mitogenomes with all other complete, published Syrphidae mitogenomes, as well as two complete mitogenomes from each other family of Muscomorpha as representatives (Table 1) due to too many species in some families. Phylogenetic analyses of mitogenome data are known to be susceptible to compositional bias, particularly of the third codon positions in the PCGs [60], and particularly in the Diptera [22,31]. To reduce this bias, we built maximum likelihood (ML) and Bayesian inference (BI) phylogenetic trees based on three different datasets: (1) PCGs: all position of the PCGs (11607 bp); (2) PCG12: first and second codon positions of the PCGs (7582 bp); (3) PCG12RNA: the first and the second codon positions of the PCGs and two rRNA genes (9496 bp). The third codon was omitted from the latter two datasets because the third codon positions of PCGs may suffer from mutation saturation, which can lead to noise during phylogenetic analysis [78,79]. Because tRNA genes are highly conserved, they were not considered for phylogenetic tree construction. Each of the PCGs was aligned with the MAFFT algorithm in Phylosuite [80] and translated according to the Invertebrate Mitochondrial genetic code, then the terminal codons were removed. The rRNA genes were aligned with MAFFT using the G-INS-i strategy [81], and the poorly aligned positions and divergent regions were removed using Gblocks 0.91b [82]. All sequences were assessed and manually concatenated using Phylosuite [80].
The heterogeneity of sequence divergence within the three datasets was analyzed using ALI-GROOVE with the default sliding window size [83]. The best partitioning schemes and corresponding nucleotide substitution models were determined using PartitionFinder2 in Phylosuite [80] with the corrected Akaike information criterion (AICc) and greedy search algorithm [84].
Maximum likelihood (ML) analysis was performed using IQ-TREE [85], and evaluated using the ultrafast bootstrap approximation approach for 1,000 replicates. Bayesian inference (BI) analyses were performed using MrBayes 3.3.6 in Phylosuite, with two simultaneous Markov chain runs of 10,000,000 generations with sampling every 1000 generations. The phylogenetic trees were visualized using FigTree 1.4.2.

Genome structure and nucleotide composition
The 18 new mitogenome sequences have been deposited in GenBank (Accession Numbers in Table 1). Their gene order and organization were similar to previously published Syrphidae mitogenomes, containing 13 protein-coding genes (PCGs), 22 transfer RNAs (tRNAs), two ribosomal RNAs (rRNAs), and a D-loop region. A total of 23 genes (14 tRNAs and nine PCGs) are encoded on the majority strand (J-strand) and 14 genes (eight tRNAs, four PCGs, and two rRNAs) on the minority strand (N-strand) (S1-S18 Tables). The whole lengths of the 18 mitogenomes ranged from 15,648 bp (Parhelophilus kurentzovi) to 16,405 bp (Epistrophe lamellata) (Figs 1 and 2). The genes of D-loop that affect the difference in whole lengths (S1-S18 Tables). The mitogenome A+T contents ranged from 81.3% to 77.7%, and they displayed a strong A+T bias, which is typical of insect mitogenomes [59,64].

Protein-coding genes
The A+T contents of the 13 PCGs combined for each species ranged from 75.3% to 79.1%. Most of the PCGs' start codons used ATN (ATA/ATT/ATC/ATG), while a few PCGs were initiated by TCG, GTG, or TTG. Among the 18 species' PCGs, COX2, COX3, ATP6, ND4L and Cytb used TAA as the stop codon, and all other genes ended with TAG or the incomplete forms TA-or T-(S1-S18 Tables).
In order to investigate the evolutionary rates of the 13 PCGs in Syrphidae, we calculated the value of Ka (non-synonymous substitutions), Ks (synonymous substitutions), and Ka/Ks for each PCG. The Ka/Ks value of all 13 PCGs were lower than 1, which indicated they are under relaxed selection pressure. The gene ATP8 showed the highest evolutionary rate (Ka/ Ks = 0.60), while COXI exhibited the lowest rate (Ka/Ks = 0.07) (Fig 3) as expected: for that reason, COXI has been used as a DNA barcode for the identification of various species [65].

Transfer and ribosomal genes
Each of the 18 syrphid species' mitogenomes included 22 tRNAs: 14 were encoded on the Jstrand and eight encoded on the N-strand. The lengths ranged from 64 to 80 bp. Their secondary structure suggests they can be folded into the typical clover-leaf structure of tRNAs, except 16S rRNAs and 12S rRNAs were both located in the N-strand. 16S rRNA was located between tRNA-L1 (UUR) and tRNA-V, and 12S rRNA was between tRNA-V and the D-loop region. The secondary structure of 16S rRNA was comprised of five domains (I, II, IV, V, and VI; domain III is absent in arthropods) and 44 helices (Fig 5), with domain IV showing higher sequence conservation than other domains (S19-S36 Figs). 12S rRNA was comprised of three domains and 24 helices, with domain III showing the highest conservation (Fig 6 and S37

Intergenic space and overlap region
Three conserved regions were found in the intergenic space and overlap regions of the Syrphidae mitogenomes. These were the overlap area between COXI and Leu1 (TCTAA), the overlap area between ATP8 and ATP6 (ATGATAA), and one intergenic space between ND1 and Leu2 (AAAACAAG).

D-loops
The D-loops of the Syrphidae mitogenomes have large variety, the length between 811 bp and 1410 bp. The gene of D-loops is the most important factors of the whole lengths of mitogenome in Syrphidae.

Phylogenetic analysis
Detecting the base heterogeneity of mitogenome datasets used for constructing a phylogenetic tree is important, as strong heterogeneity can cause major errors [86][87][88]. On the basis of the calculation results obtained from the AliGROOVE [83], the heterogeneities of the PCG, PCG12, and PCG12RNA datasets were weak (Fig 7). Hence, the three datasets could be used to construct a phylogenetic tree. The ML and BI phylogenetic trees were reconstructed based on 76 Muscomorpha species and three outgroup species using three datasets to account for third codon compositional bias (PCG12RNA, PCG12, and PCGs), with the best partitioning scheme and models selected by PartitionFinder2. These six phylogenetic trees (three ML and three BI) had generally high support, with most nodes receiving high nodal support values and only a few nodes receiving moderate or low support (Figs 7 and 8).
In all resulting trees, all the Diptera families were monophyletic with strong support, with the exception of Anthomyiidae, which was paraphyletic in the PCG12RNA-ML tree (Fig 8E). Superfamilies were not necessarily monophyletic. The superfamily Syrphoidea was only monophyletic in the PCGRNA-BI tree; in other trees, Pipunculus sp. (Syrphoidea: Pipunculidae) always clustered with Trypetoptera punctulata (Sciomyzoidea: Sciomyzidae). Except in the PCG-ML and PCG12-BI trees, the superfamily Sciomyzoidea was paraphyletic, with Nemopoda mamaevi (Sciomyzoidea: Sepsidae) inferred as sister group to the Lauxanioidea. Oestroidea was always paraphyletic, with differing relationships between families depending on the dataset used (Fig 8).
Both the BI and ML phylogenetic analyses of Syrphidae showed similar topology across all three datasets (Fig 9 and S55-S59 Figs). The monophyly of the subfamily Syrphinae was supported, within which the tribe Syrphini was clustered together with strong nodal supports, and the tribe Melanostomini was sister to Syrphini. The subfamily Eristalinae was paraphyletic in relation to Syrphinae. In particular, a monophyletic clade containing four putatively Eristalinae species appeared more closely related to the Syrphinae than the other Eristalinae, with the following topology: (((Ferdinandea cuprea + Korinchia angustiabdomena) + Volucella nigricans) + Orthonevra geniculata).

Discussion
The mitogenomes from the 18 Syrphidae species in this study had similar genome length, genome structure, and nucleotide compositions as other Syrphidae mitogenomes, with many conserved regions. The results were in agreement with previous papers on Syrphidae, Diptera, and Insecta mitogenomes. The gene orders are similar with the arthropod ancestral mitochondrial genome. The conserved sequence (ATGATAA) in the overlap between ATP8 and ATP6 was found in other Muscomorpha insect mitogenomes, and is known from other insects, such as Fulgoroidae (Hemiptera) and Cercopidae (Hemiptera) [66,67]. This region was thought to be translated as a bicstron [68]. Those conserved fragment might play an important role in transcription and controlling the length of PCGs [69].

PLOS ONE
Eighteen mitogenomes of Syrphidae with phylogenetic analysis of Muscomorpha

PLOS ONE
The lengths of every PCG in all 43 Syrphidae mitogenomes did not differ significantly, varying by no more than 35 bp, with the exception of COXI at 141 bp. This is expected from the relatively conserved characteristics of mitogenome PCGs. The limited length variations in all tRNAs and rRNAs observed among different species were mainly due to their stable secondary structures [89].
The most common nucleotide in the first codon position of the PCG start codon was A, followed by T in the second codon position. The stop codon was TA-or T-. The frequent use of these codons is an important reason for the A+T content being greater than the G+C content, as observed in other insect mitogenomes [90]. One reason for this base composition bias is asymmetric mutations that occurred during replication and transcription, and selective pressures after asymmetric mutations [17,74,91].
When these data were combined with other Muscomorpha mitogenomes, the resulting trees' results agreed with some previous papers and disagreed with others. The choice of dataset and phylogenetic analysis method (ML or BI) affected some results. The PCG12-ML tree was most consistent with our point of view and previous morphological classifications about Syrphidae (Fig 9) [53,57].
Our results supported the paraphyly of Aschiza and Schizophora. The Platypezoidea diverged first, sister to the other species of Muscomorpha (Fig 8). The paraphyly of Syrphoidea due to Pipinculidae being closely related to the Schizophora and Opomyzoidea (Fig 8A-8E) supports previous findings [1,2,5].
The Acalyptratae was paraphyletic in our results. The superfamilies Ephydroidea, Tephritoidea, Lauxanioidea, and Opomyzoidea were monophyletic (Fig 8B, 8C, 8D and 8F). The superfamily Sciomyzoidea was paraphyletic. The family Sepsidae clustered with Lauxanioidea, and the Sciomyzidae was sister to Pipunculidae (Fig 8B-8D). The relationships of the Acalyptratae families remain obscure, likely because few characters were accumulated since the first Schizophora radiation, when these early lineages diverged rapidly [2]. More molecular information for the Acalyptratae is needed.
The phylogenetic analyses of Syrphidae found the subfamily Eristalinae was paraphyletic in relation to Syrphinae (Fig 9), in agreement with the results of Mengual et al. [51], Skevington and Yeates [8], and Young et al. [48]. Our results supported the monophyly of the tribe Eristalini in accordance with previous studies [9,51]. The taxonomic status of some species could not be resolved by these trees. For example, in the PCGs dataset (S55 and S56 Figs), Asarkina ericetorum was most closely related to Dideoides latus, while in other trees it was more closely related to Ocyptamus sativus and Platycheirus albimanus. Also, while the topology of (Eristalinus viridis + Mallota bellus) was well supported, the position of this clade was less stable (Fig  9). The genera of Epistrophe, Eupeodes and Mallota were not restored monophyletic in our study. The mitogenome information inferred in this study is limited, warranting the sampling of more taxa to confirm the conclusions reached here.
In this study, we found three unique, conserved fragments in all syrphid mitogenomes. Phylogenetic analyses were performed based on the nucleotide data of 13 PCGs and two rRNAs from 76 Muscomorpha and three outgroup species. Our results supported the paraphyly of Aschiza and Schizophora. The Acalyptratae was paraphyletic and the relationships of its superfamilies were difficult to determine. The monophyly of Calyptratea was supported, but the relationships of Oestroidea and Muscoidea need to be reconsidered. The monophyly of Syrphidae was supported. The Syrphinae were clustered into one branch, while the paraphyly of Eristalinae was strongly supported. We discussed the phylogenetic relationship of Muscomorpha using the latest comprehensive data, which we hope will provide a base reference for further phylogenetic analysis of Muscomorpha and the Diptera.